{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 构建结构化的程序\n",
    "----\n",
    "截止目前相信大家已经对使用python进行自然语言处理有了一定的了解。然而如果你之前并没有python的基础的话，可能在使用python时还是有各种不爽的地方。所以本章旨在帮您解决如下几个问题：\n",
    "\n",
    "1. 如何构建具有良好结构、可读性强的、能够方便他人复用的程序？\n",
    "2. python中诸如循环、函数以及赋值是如何工作的？\n",
    "3. 使用python可能遇到的坑有哪些，如何避免踩坑？\n",
    "\n",
    "本章中您会通过大量的例子巩固python编程基础，同时学会一些自然语言处理方面有用的可视化技术。如果您对python早有了解，则可以快速略过本章。若没有相关的程序设计经验，则通过仔细学习将会有很大提高。\n",
    "\n",
    "本章中由于需要，将会涉及到许多和NLP技术相关的程序设计概念。我们并不会赘述很多，我们值关注对于NLP最为重要的那一部分。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 本章导读\n",
    "\n",
    "本章主要是介绍Python语言中的基础知识以及一些在自然语言处理中常用到的知识点，是对语言基础的一次补充。\n",
    "\n",
    "主要以以下几个方面进行展开。\n",
    "\n",
    "* 语言基础\n",
    "* 更加Python风格的程序\n",
    "* 函数式编程入门\n",
    "* 程序开发与设计\n",
    "* 算法和相关工具"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 一、语言基础"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 1、is 与 `==` 的区别\n",
    "\n",
    "在python中，使用 `is` 关键字和 '==' 符号进行两个对象之间是否一致的判定。\n",
    "\n",
    "其中 `is` 作用在于判定两个对象是否是同一对象，而 `==` 是对两个对象的值是否一致进行检测。\n",
    "\n",
    "最为直观的例子是深拷贝之后的两个对象，其值是相同的，但是在内存中并不再指向统一区域，因此不是同一个对象，示例如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import copy"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "objA = [1,2,3,4,5,6]\n",
    "objB = copy.deepcopy(objA)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "objA is objB"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "objA == objB"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 运算符重载引起的变化\n",
    "\n",
    "此处还有一点，在进行对象定义时，可以将其默认的成员函数 `__eq__`进行重载，重载之后，使用`==`对两个相同的对象进行euqal的判定时，就会调用这一函数进行全等的判断。\n",
    "\n",
    "其他的常用的python中的魔法命令，请参见：http://pyzh.readthedocs.io/en/latest/python-magic-methods-guide.html"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "class object_equal:\n",
    "    def __init__(self,param_a,param_b):\n",
    "        self.member_a = 0\n",
    "        self.member_b = 1\n",
    "        self.version = '1.0'\n",
    "    def __eq__(self,other):\n",
    "        return self.version == other.version"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [],
   "source": [
    "equ_a = object_equal(0,1)\n",
    "equ_b = object_equal(3,4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "equ_a == equ_b"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 2、深拷贝与浅拷贝"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 浅拷贝\n",
    "\n",
    "浅拷贝类似于 Pass by Reference 传地址引用。拷贝的新对象和之前的对象在内存中共享一片存储区域\n",
    "\n",
    "```a```和```b```共享同一块内存区域，因此针对```a```的修改对应会同步到```b```上。拷贝过程是，```b```作为一个指针，指向```a```所在的内存的位置。\n",
    "\n",
    "如图所示，浅拷贝在内存中的表现：\n",
    "\n",
    "![浅拷贝图解（图片来自网络）](./pic/shallow copy.png)\n",
    "\n",
    "浅拷贝图解（图片来自网络）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = [[],[],[]]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [],
   "source": [
    "b = a.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "a[1].append(666)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[], [666], []]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[], [666], []]"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "b[2].append(777)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[], [666], [777]]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[], [666], [777]]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2068325120776\n",
      "2068325290120\n",
      "2068325121864\n"
     ]
    }
   ],
   "source": [
    "for i in a:\n",
    "    print(id(i))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2068325120776\n",
      "2068325290120\n",
      "2068325121864\n"
     ]
    }
   ],
   "source": [
    "for p in b:\n",
    "    print(id(p))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 深拷贝\n",
    "深拷贝类似于 Pass by Value 传值引用\n",
    "\n",
    "`c`是`a`的一个值的copy，深拷贝的过程是，先申请和a同样大小的空间，将`a`的值复制到这片空间中。\n",
    "\n",
    "这样过程之后`c`与`a`是两个相互独立的变量，并不共享任何空间。\n",
    "\n",
    "深拷贝的过程如图所示：\n",
    "\n",
    "![](./pic/deep copy.png)\n",
    "\n",
    "深拷贝过程（图片来自网络）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "c = copy.deepcopy(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[], [666], [777]]"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "c[0].append(666)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[666], [666], [777]]"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[], [666], [777]]"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "a = ()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "tuple"
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "type(a)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[666], [666], [777]]"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 一个相关的例子\n",
    "\n",
    "Python 中的语法糖，对列表使用乘法能够将当前列表中元素复制多份,然后首位相接。而这样的复制多份实际上也只是浅拷贝，扩充列表，然后将对应位置的指针指向之前的重复内容上。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [],
   "source": [
    "p = [1,2,3] * 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1785054960\n",
      "1785054992\n",
      "1785055024\n",
      "1785054960\n",
      "1785054992\n",
      "1785055024\n",
      "1785054960\n",
      "1785054992\n",
      "1785055024\n"
     ]
    }
   ],
   "source": [
    "for q in p:\n",
    "    print(id(q))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对数组进行赋值时，将对应的位置上的值使用新值进行替换"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [],
   "source": [
    "p[0]= 0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0, 2, 3, 1, 2, 3, 1, 2, 3]"
      ]
     },
     "execution_count": 52,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "p"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1785054928\n",
      "1785054992\n",
      "1785055024\n",
      "1785054960\n",
      "1785054992\n",
      "1785055024\n",
      "1785054960\n",
      "1785054992\n",
      "1785055024\n"
     ]
    }
   ],
   "source": [
    "for q in p:\n",
    "    print(id(q))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 更直观一些的例子\n",
    "\n",
    "对一个包含着空`list`的`list`进行重复操作。\n",
    "\n",
    "可以看到，空`list`被重复了三份，但是指向同一对象。\n",
    "\n",
    "对一个`list`的操作会同步到其他的`list`上。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "list1 = [[]]*3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[], [], []]"
      ]
     },
     "execution_count": 46,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2280831016584\n",
      "2280831016584\n",
      "2280831016584\n"
     ]
    }
   ],
   "source": [
    "for i in list1:\n",
    "    print(id(i))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 48,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "list1[1].append(666)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 49,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[666], [666], [666]]"
      ]
     },
     "execution_count": 49,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "但使用列表生成器的方式创建同样形式的`list`时，其结构和上面的结构已经大有不同。\n",
    "\n",
    "内部的各个`list`成员都是不同的对象。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "list2 = [[] for i in range(0,3)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[], [], []]"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2280831016392\n",
      "2280831278792\n",
      "2280830880840\n"
     ]
    }
   ],
   "source": [
    "for i in list2:\n",
    "    print(id(i))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "list2[1].append(666)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[[], [666], []]"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "二者在使用 `is` 和 `==` 时产生的结果也是不一样的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "metadata": {},
   "outputs": [],
   "source": [
    "li1 = [[]]*3\n",
    "li2 = [[] for i in range(3)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "li1 is li2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 56,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "li1 == li2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "li2[1].append(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "li1 == li2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "列表的生成与对比"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "list1 = [[]] * 3"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "list2 = [[] for i in range(0,3)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list1 == list2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list1 is list2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "list2[1].append(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "False"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list1 == list2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 3、`if...else...` 和 `else` 的其他搭配"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 基础用法\n",
    "\n",
    "分支结构，执行判断作用。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1、if...else...\n",
    "\n",
    "当condition_a成立时，进入 `if` 下的block，执行语句`print('a right')`\n",
    "\n",
    "否则进入 `else` 下的block，执行语句`print('a wrong')`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "metadata": {},
   "outputs": [],
   "source": [
    "condition_a = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a right\n"
     ]
    }
   ],
   "source": [
    "if condition_a:\n",
    "    print('a right')\n",
    "else:\n",
    "    print('a wrong')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "2、if...elif...else...\n",
    "\n",
    "`elif` 操作类似于 `else if`操作，和相同缩进的最近的if搭配，组成连续的判断体。\n",
    "\n",
    "仅当if条件判断失败时执行`elif`判断。\n",
    "\n",
    "当所有的`if`和`elif`都判断失败时执行`else`语句block中语句。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {},
   "outputs": [],
   "source": [
    "condition_a = False\n",
    "condition_b = True"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "a wrong b right\n"
     ]
    }
   ],
   "source": [
    "if condition_a:\n",
    "    print('a right')\n",
    "elif condition_b:\n",
    "    print('a wrong b right')\n",
    "else:\n",
    "    print('a wrong b wrong')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "3、for...else....\n",
    "\n",
    "`for` 结构和 `else` 结构的搭配。仅当 `for` 正常完成全部循环时执行 `else` ，否则 `else`  中内容不执行。\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "Iter 5 times\n"
     ]
    }
   ],
   "source": [
    "for i in range(5):\n",
    "    print('True')\n",
    "else:\n",
    "    print('Iter {0} times'.format(i+1))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n",
      "True\n"
     ]
    }
   ],
   "source": [
    "for i in range(10):\n",
    "    if i > 5:\n",
    "        break\n",
    "    print('True')\n",
    "else:\n",
    "    print('Iter {0} times'.format(i))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "一个较好的用法是用于查找某一个值是否存在。\n",
    "\n",
    "遍历一个列表，如果值存在，则输出这个值，之后跳出搜索过程。否则提示找不到该值。这样的写法能够减少部分代码量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 68,
   "metadata": {},
   "outputs": [],
   "source": [
    "search_table = [i*2 for i in range(10)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 69,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "没找到。\n"
     ]
    }
   ],
   "source": [
    "target = 3\n",
    "for i in search_table:\n",
    "    if i == target:\n",
    "        print('我找到啦！')\n",
    "else:\n",
    "    print('没找到。')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "4、try...except...else...finally\n",
    "\n",
    "`try`和`except`搭配，对`try` block内部的语句错误进行捕获处理。\n",
    "\n",
    "而当对应的`try...except`结构没有能够捕获对应错误时，就会执行`else` block中的语句。\n",
    "\n",
    "`finally`中的语句，是对`try...except...else...` 结构的收尾，无论前面执行情况如何都会执行。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [],
   "source": [
    "def divide(x, y):\n",
    "    try:\n",
    "        result = x / y\n",
    "    except ZeroDivisionError:\n",
    "        print(\"division by zero!\")\n",
    "    else:\n",
    "        print(\"result is\", result)\n",
    "    finally:\n",
    "        print(\"executing finally clause\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "division by zero!\n",
      "executing finally clause\n"
     ]
    }
   ],
   "source": [
    "divide(2,0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 72,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "result is 2.0\n",
      "executing finally clause\n"
     ]
    }
   ],
   "source": [
    "divide(2,1)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到`else`中的语句重新引发了`try...except...`中没有捕获的错误。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 73,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "executing finally clause\n"
     ]
    },
    {
     "ename": "TypeError",
     "evalue": "unsupported operand type(s) for /: 'str' and 'str'",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-73-305a5292c268>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mdivide\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m'wo'\u001b[0m\u001b[1;33m,\u001b[0m\u001b[1;34m'si'\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;32m<ipython-input-70-fb375c737bea>\u001b[0m in \u001b[0;36mdivide\u001b[1;34m(x, y)\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[1;32mdef\u001b[0m \u001b[0mdivide\u001b[0m\u001b[1;33m(\u001b[0m\u001b[0mx\u001b[0m\u001b[1;33m,\u001b[0m \u001b[0my\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      2\u001b[0m     \u001b[1;32mtry\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[1;32m----> 3\u001b[1;33m         \u001b[0mresult\u001b[0m \u001b[1;33m=\u001b[0m \u001b[0mx\u001b[0m \u001b[1;33m/\u001b[0m \u001b[0my\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m\u001b[0;32m      4\u001b[0m     \u001b[1;32mexcept\u001b[0m \u001b[0mZeroDivisionError\u001b[0m\u001b[1;33m:\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0;32m      5\u001b[0m         \u001b[0mprint\u001b[0m\u001b[1;33m(\u001b[0m\u001b[1;34m\"division by zero!\"\u001b[0m\u001b[1;33m)\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n",
      "\u001b[1;31mTypeError\u001b[0m: unsupported operand type(s) for /: 'str' and 'str'"
     ]
    }
   ],
   "source": [
    "divide('wo','si')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 4、`list` 和 `tuple`区别\n",
    "\n",
    "`list` 为pyhton中序列结构，其存储为链表形式。在认知上一般来讲应当是存储同质数据的数据结构。即某一类对象的一序列形式的结果。\n",
    "\n",
    "`tuple` 为python中用处较多的结构，一些默认方法往往会将 `tuple` 作为中间结果。`tuple`对象是不可变的，即组成结构是稳定的，一旦对象被创建，任何改变`tuple`结构的操作都是不允许的。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "可以看到`tuple`创建之后自身的成员无法进行指定修改，但是成员对象本身的一些操作是允许的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "metadata": {},
   "outputs": [],
   "source": [
    "tp1 = tuple(([],1,2,3,'name'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 84,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "([], 1, 2, 3, 'name')"
      ]
     },
     "execution_count": 84,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tp1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 85,
   "metadata": {},
   "outputs": [
    {
     "ename": "TypeError",
     "evalue": "'tuple' object does not support item assignment",
     "output_type": "error",
     "traceback": [
      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
      "\u001b[1;31mTypeError\u001b[0m                                 Traceback (most recent call last)",
      "\u001b[1;32m<ipython-input-85-fc6b0b59b476>\u001b[0m in \u001b[0;36m<module>\u001b[1;34m()\u001b[0m\n\u001b[1;32m----> 1\u001b[1;33m \u001b[0mtp1\u001b[0m\u001b[1;33m[\u001b[0m\u001b[1;36m2\u001b[0m\u001b[1;33m]\u001b[0m \u001b[1;33m=\u001b[0m \u001b[1;36m1\u001b[0m\u001b[1;33m\u001b[0m\u001b[0m\n\u001b[0m",
      "\u001b[1;31mTypeError\u001b[0m: 'tuple' object does not support item assignment"
     ]
    }
   ],
   "source": [
    "tp1[2] = 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 86,
   "metadata": {},
   "outputs": [],
   "source": [
    "tp1[0].append(123)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 87,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "([123], 1, 2, 3, 'name')"
      ]
     },
     "execution_count": 87,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "tp1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`tuple` 对象可以作为`dict`对象的键"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 88,
   "metadata": {},
   "outputs": [],
   "source": [
    "dict1 = {\n",
    "    (1,2):'zhang',\n",
    "    (2,3):'wang',\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 89,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{(1, 2): 'zhang', (2, 3): 'wang'}"
      ]
     },
     "execution_count": 89,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "dict1"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 5、`generator` 用法\n",
    "\n",
    "一些情况下使用生成器能够节省时间和空间开销。\n",
    "\n",
    "![](./pic/generator save time.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import nltk"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "text = 'The second line uses a generator expression. This is more than a notational convenience: in many language processing situations, generator\\\n",
    "expressions will be more efficient. In , storage for the list object must be allocated before the value of max() is computed. If the text is very large,\\\n",
    "this could be slow. In , the data is streamed to the calling function. Since the calling function simply has to find the maximum value — the word\\\n",
    "which comes latest in lexicographic sort order — it can process the stream of data without having to store anything more than the maximum value\\\n",
    "seen so far.'\n",
    "text = text *10000"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 loop, best of 3: 12.2 s per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit max([w.lower() for w in nltk.word_tokenize(text)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 loop, best of 3: 12.2 s per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit max(w.lower() for w in nltk.word_tokenize(text))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# 6、`iterator` 用法\n",
    "\n",
    "可以通过重载对象的\\_\\_next\\_\\_方法和 \\_\\_iter\\_\\_方法来实现自定义的迭代"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Reverse:\n",
    "    def __init__(self,_list):\n",
    "        self.__list = _list\n",
    "        self.__len = len(_list)\n",
    "    def __next__(self):\n",
    "        if self.__len == 0:\n",
    "            raise StopIteration\n",
    "        else:\n",
    "            self.__len -= 1\n",
    "            return self.__list[self.__len]\n",
    "    def __iter__(self):\n",
    "        self.__len = len(self.__list)\n",
    "        return self"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "rev = Reverse([i for i in range(0,10)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[9, 8, 7, 6, 5, 4, 3, 2, 1, 0]"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[i for i in rev]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 二、Pythonic\n",
    "\n",
    "把程序写的更加Python化。\n",
    "\n",
    "一般而言，Pythonic指书写更加具有python风格的程序，python语言本身融合了命令式语言、函数式语言、脚本语言、解释语言的特点。其拥有诸多语法糖在程序循环控制、分支控制、错误处理以及其他方面有着优化。目标在于使得程序结构更加清晰明了，减少重复和不必要的浪费，节省时间。\n",
    "\n",
    "![](./pic/pythonic what.png)\n",
    "什么是pythonic\n",
    "\n",
    "![](./pic/pythonic why.png)\n",
    "为什么要写的更pythonic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 不是那么Pythonic的代码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [],
   "source": [
    "def dict_process(input_dict):\n",
    "    '''C/C++ like'''\n",
    "    count = 0\n",
    "    max_len = len(input_dict.keys())\n",
    "    while count < max_len:\n",
    "        print('Current key number:{0}'.format(count))\n",
    "        key = input_dict.keys()[count]\n",
    "        input_dict[key] += 17\n",
    "        count += 1\n",
    "    return input_dict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 更加Pyhtonic的代码"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {},
   "outputs": [],
   "source": [
    "def dict_process_pynic(input_dict):\n",
    "    '''python way'''\n",
    "    for num,key in enumerate(input_dict.keys()):\n",
    "        print('Current key number:{0}'.format(num))\n",
    "        input_dict[key] += 17\n",
    "    return input_dict"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 如何将代码写的更pythonic"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](./pic/pythonic how1.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![](./pic/pythonic how2.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 三、Function 与 函数式编程\n",
    "\n",
    "详情请见附件的ppt内容。较为详细。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {},
   "outputs": [],
   "source": [
    "def swap(first,second):\n",
    "    '''\n",
    "    this function for swap two parameter\n",
    "    input:\n",
    "        first means first parameter\n",
    "        second means second parameter\n",
    "    \n",
    "    return:\n",
    "        two value swap their location\n",
    "    '''\n",
    "    new_first, new_second = second, first\n",
    "    return new_first, new_second"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "swap()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 42,
   "metadata": {},
   "outputs": [],
   "source": [
    "def swap1(a,b):\n",
    "    b,a=a,b\n",
    "    return a,b"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "metadata": {},
   "outputs": [],
   "source": [
    "def use_position_arg(lng,lat,*pos_arg):\n",
    "    print(pos_arg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "('Raynor', 'Terran')\n"
     ]
    }
   ],
   "source": [
    "use_position_arg(24,36,'Raynor','Terran')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 52,
   "metadata": {},
   "outputs": [],
   "source": [
    "def use_keywords_arg(lng,lat,**keyword_arg):\n",
    "    print(keyword_arg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'Name': 'Raynor', 'Species': 'Terran'}\n"
     ]
    }
   ],
   "source": [
    "use_keywords_arg(24,36,Name='Raynor',Species='Terran')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "函数自定义参数的一些使用方法"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 56,
   "metadata": {},
   "outputs": [],
   "source": [
    "def both_arg(lng,lat,*p_arg,**k_arg):\n",
    "    print('p_arg:',p_arg)\n",
    "    print('k_arg:',k_arg)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "p_arg: ('laotie', 666)\n",
      "k_arg: {'Species': 'Terran', 'name': 'Raynor'}\n"
     ]
    }
   ],
   "source": [
    "both_arg(24,36,'laotie',666,Species='Terran',name='Raynor')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## isInstance check 检查实例是否一致"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "metadata": {},
   "outputs": [],
   "source": [
    "def order_by_dict_value(not_order):\n",
    "    assert isinstance(not_order,dict),'input must be dictionary'\n",
    "    ordered_key = sorted(not_order,key=lambda x:not_order[x],reverse=True)\n",
    "    new_dict= {}\n",
    "    for key in ordered_key:\n",
    "        new_dict[key] = not_order[key]\n",
    "    return new_dict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 71,
   "metadata": {
    "scrolled": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'c': 3123, 'a': 65, 'b': 43, 'd': 1}"
      ]
     },
     "execution_count": 71,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "order_by_dict_value({'a':65,'b':43,'c':3123,'d':1})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# generator 函数式编程"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [],
   "source": [
    "def multiable():\n",
    "    '''乘法表'''\n",
    "    for i in range(1,10):\n",
    "        for j in range(1,10):\n",
    "            if j<i:\n",
    "                continue\n",
    "            else:\n",
    "                yield '{0}*{1}'.format(i,j)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [],
   "source": [
    "def _multiable():\n",
    "    '''乘法表'''\n",
    "    table = []\n",
    "    for i in range(1,10):\n",
    "        for j in range(1,10):\n",
    "            if j<i:\n",
    "                continue\n",
    "            else:\n",
    "                table.append('{0}*{1}'.format(i,j))\n",
    "    return table"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "[i for i in multiable()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "_multiable()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# map & reduce 函数式编程"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [],
   "source": [
    "from functools import reduce"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "def reduce_sum(list_of_number):\n",
    "    return reduce(lambda x,y:x+y,list_of_number,0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "45"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reduce_sum([i for i in range(0,10)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [],
   "source": [
    "def reduce_mean(list_of_number):\n",
    "    return reduce(lambda x,y:(x+y),list_of_number,0)/len(list_of_number)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "4.5"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "reduce_mean([i for i in range(0,10)])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [],
   "source": [
    "def square(numbers):\n",
    "    return map(lambda x:x*x,numbers)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[p for p in square([i for i in range(0,10)])]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# filter 函数式编程"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [],
   "source": [
    "def odd(list_of_int):\n",
    "    return filter(lambda x:x%2==1,list_of_int)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[1, 3, 5, 7, 9]"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "[p for p in odd([i for i in range(0,10)])]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [default]",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
